[DRAFT] Add support for lambda column capture#21323
[DRAFT] Add support for lambda column capture#21323gstvg wants to merge 50 commits intoapache:mainfrom
Conversation
| } else if let Some(lambda_variable) = | ||
| expr.as_any().downcast_ref::<LambdaVariable>() | ||
| { | ||
| used_column_indices.insert(lambda_variable.index()); |
There was a problem hiding this comment.
I'm 98% sure this has a bug for conflicting indices for lambda variable and columns, and even if you separate lambda variable indices from the column indices you can still have problem with nested lambda variables and using upper lambda variable inside nested ones
There was a problem hiding this comment.
I added a sqllogictest test which I hope includes all the cases you cited and more (4932cae). Compared to your snippet at #21231 (comment) where lambda variables are included first in the scoped schema and external columns after them, here lambda variables are pushed to the end of the outer schema, which still includes unreferenced columns, and in case of any name conflicts(a lambda variable shadows a field from the outer schema), we rename the shadowed field to an unique name ( 5c5ca19#diff-a3e127629e9516ec496d656ebb53a1e8bf730eb02d219c4ce42ee47572685844R253-R325, 5c5ca19#diff-7fb0a64e734f54d94d48e9e02c51573a3678205f9ee8e2afaf41d686187a285eR440-R489). That way, after a field has been introduced into the schema, be it a column on the outermost schema or a lambda variable into inner schemas, their index never changes, regardless of how many new scopes are created from it down the tree. Because of that, the casewhen optimization (as well as the same opimization in lambdas) can safely collect all indices and assume all those that are out-of-bounds of the scoped batch it's projecting refer to inner lambda variables not yet available. It still need to rewrite all of them since they were originally computed based on the unprojected, full schema, and any projection of a outer schema affects the indices of all it's derived, inner schemas, and must be propagated down the tree, for every projection(inner projections couldn't know how to rewrite indices of outer projection)
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?